Method for breast cancer recurrence prediction under endocrine treatment

ABSTRACT

The present invention relates to methods, kits and systems for the prognosis of the disease outcome of breast cancer, said method comprising:
         (a) determining in a tumor sample from said patient the RNA expression levels of at least 2 of the following 9 genes: UBE2C, BIRC5, RACGAP1, DHCR7, STC2, AZGP1, RBBP8, IL6ST, and MGP   (b) mathematically combining expression level values for the genes of the said set which values were determined in the tumor sample to yield a combined score, wherein said combined score is indicative of a prognosis of said patient; and kits and systems for performing said method.

TECHNICAL FIELD

The present invention relates to methods, kits and systems for theprognosis of the disease outcome of breast cancer. More specific, thepresent invention relates to the prognosis of breast cancer based onmeasurements of the expression levels of marker genes in tumor samplesof breast cancer patients.

BACKGROUND OF THE INVENTION

Breast cancer is one of the leading causes of cancer death in women inwestern countries. More specifically breast cancer claims the lives ofapproximately 40,000 women and is diagnosed in approximately 200,000women annually in the United States alone. Over the last few decades,adjuvant systemic therapy has led to markedly improved survival in earlybreast cancer. This clinical experience has led to consensusrecommendations offering adjuvant systemic therapy for the vast majorityof breast cancer patients (EBCAG). In breast cancer a multitude oftreatment options are available which can be applied in addition to theroutinely performed surgical removal of the tumor and subsequentradiation of the tumor bed. Three main and conceptually differentstrategies are endocrine treatment, chemotherapy and treatment withtargeted therapies. Prerequisite for treatment with endocrine agents isexpression of hormone receptors in the tumor tissue i.e. either estrogenreceptor, progesterone receptor or both. Several endocrine agents withdifferent mode of action and differences in disease outcome when testedin large patient cohorts are available. Tamoxifen has been the mainstayof endocrine treatment for the last three decades. Large clinical trialsshowed that tamoxifen significantly reduced the risk of tumorrecurrence. An additional treatment option is based on aromataseinhibitors which belong to a new endocrine drug class. In contrast totamoxifen which is a competitive inhibitor of estrogen binding aromataseinhibitors block the production of estrogen itself thereby reducing thegrowth stimulus for estrogen receptor positive tumor cells. Still, somepatients experience a relapse despite endocrine treatment and inparticular these patients might benefit from additional therapeuticdrugs. Chemotherapy with anthracyclines, taxanes and other agents havebeen shown to be efficient in reducing disease recurrence in estrogenreceptor positive as well as estrogen receptor negative patients. TheNSABP-20 study compared tamoxifen alone against tamoxifen pluschemotherapy in node negative estrogen receptor positive patients andshowed that the combined treatment was more effective than tamoxifenalone. However, the IBCSG IX study comparing tamoxifen alone againsttamoxifen plus chemotherapy failed to show any significant benefit forthe addition of cytotoxic agents. Recently, a systemically administeredantibody directed against the HER2/neu antigen on the surface of tumorcells have been shown to reduce the risk of recurrence several fold in apatients with Her2neu over expressing tumors. Yet, most if not all ofthe different drug treatments have numerous potential adverse effectswhich can severely impair patients' quality of life (Shapiro and Recht,2001; Ganz et al., 2002). This makes it mandatory to select thetreatment strategy on the basis of a careful risk assessment for theindividual patient to avoid over- as well as under treatment. Since thebenefit of chemotherapy is relatively large in HER2/neu positive andtumors characterized by absence of HER2/neu and estrogen receptorexpression (basal type), compared to HER2/neu negative and estrogenreceptor positive tumors (luminal type), the most challenging treatmentdecision concerns luminal tumors for which classical clinical factorslike grading, tumor size or lymph node involvement do not provide aclear answer to the question whether to use chemotherapy or not. Newermolecular tools like a 21 gene assay, a genomic grade index assay andothers have been developed to address this medical need.

Treatment guidelines are usually developed by renowned experts in thefield. In Europe the St Gallen guidelines from the year 2009 recommendchemotherapy to patients with HER2 positive breast cancer as well as topatients with HER2 negative and ER negative disease. Uncertainty aboutthe usefulness of chemotherapy arises in patients with HER2 negative andER positive disease. In order to make a balanced treatment decision forthe individual the likelihood of cancer recurrence is used as the mostuseful criteria. Clinical criteria like lymph node status, tumorgrading, tumor size and others are helpful since they provideinformation about the risk of recurrence. More recently, multigeneassays have been shown to provide information superior or additional tothe standard clinical risk factors. It is generally recognized, thatproliferation markers seem to provide the dominant prognosticinformation. Prominent examples of those predictors are the Mammaprinttest from Agendia, the Relapse Score from Veridex and the Genomic GradeIndex, developed at the institute Jules Bordet and licensed to Ipsogen.All of these assays are based on determination of the expression levelsof at least 70 genes and all have been developed for RNA not heavilydegraded by formalin fixation and paraffin embedding, but isolated fromfresh tissue (shipped in RNALater™). Another prominent multigene assayis the Recurrence Score test of Genomic Health Inc. The test determinesthe expression level of 16 cancer related genes and 5 reference genesafter RNA extraction from formalin fixed and paraffin embedded tissuesamples.

However, the current tools suffer from a lack of clinical validity andutility in the most important clinical risk group, i.e. those breastcancer patients of intermediate risk of recurrence based on standardclinical parameter. Therefore, better tools are needed to optimizetreatment decisions based on patient prognosis. For the clinical utilityof avoiding chemotherapy, a test with a high sensitivity and highnegative predictive value is needed, in order not to undertreat apatient that eventually develops a distant metastasis after surgery. Inregard to the continuing need for materials and methods useful in makingclinical decisions on adjuvant therapy, the present invention fulfillsthe need for advanced methods for the prognosis of breast cancer on thebasis of readily accessible clinical and experimental data.

Definitions

Unless defined otherwise, technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs.

The term “cancer” is not limited to any stage, grade, histomorphologicalfeature, aggressivity, or malignancy of an affected tissue or cellaggregation.

The term “predicting an outcome” of a disease, as used herein, is meantto include both a prediction of an outcome of a patient undergoing agiven therapy and a prognosis of a patient who is not treated. The term“predicting an outcome” may, in particular, relate to the risk of apatient developing metastasis, local recurrence or death.

The term “prediction”, as used herein, relates to an individualassessment of the malignancy of a tumor, or to the expected survivalrate (OAS, overall survival or DFS, disease free survival) of a patient,if the tumor is treated with a given therapy. In contrast thereto, theterm “prognosis” relates to an individual assessment of the malignancyof a tumor, or to the expected survival rate (OAS, overall survival orDFS, disease free survival) of a patient, if the tumor remainsuntreated.

An “outcome” within the meaning of the present invention is a definedcondition attained in the course of the disease. This disease outcomemay e.g. be a clinical condition such as “recurrence of disease”,“development of metastasis”, “development of nodal metastasis”,development of distant metastasis”, “survival”, “death”, “tumorremission rate”, a disease stage or grade or the like.

A “risk” is understood to be a number related to the probability of asubject or a patient to develop or arrive at a certain disease outcome.The term “risk” in the context of the present invention is not meant tocarry any positive or negative connotation with regard to a patient'swellbeing but merely refers to a probability or likelihood of anoccurrence or development of a given condition.

The term “clinical data” relates to the entirety of available data andinformation concerning the health status of a patient including, but notlimited to, age, sex, weight, menopausal/hormonal status, etiopathologydata, anamnesis data, data obtained by in vitro diagnostic methods suchas histopathology, blood or urine tests, data obtained by imagingmethods, such as x-ray, computed tomography, MRI, PET, spect,ultrasound, electrophysiological data, genetic analysis, gene expressionanalysis, biopsy evaluation, intraoperative findings.

The term “node positive”, “diagnosed as node positive”, “nodeinvolvement” or “lymph node involvement” means a patient havingpreviously been diagnosed with lymph node metastasis. It shall encompassboth draining lymph node, near lymph node, and distant lymph nodemetastasis. This previous diagnosis itself shall not form part of theinventive method. Rather it is a precondition for selecting patientswhose samples may be used for one embodiment of the present invention.This previous diagnosis may have been arrived at by any suitable methodknown in the art, including, but not limited to lymph node removal andpathological analysis, biopsy analysis, in-vitro analysis of biomarkersindicative for metastasis, imaging methods (e.g. computed tomography,X-ray, magnetic resonance imaging, ultrasound), and intraoperativefindings.

In the context of the present invention a “biological sample” is asample which is derived from or has been in contact with a biologicalorganism. Examples for biological samples are: cells, tissue, bodyfluids, lavage fluid, smear samples, biopsy specimens, blood, urine,saliva, sputum, plasma, serum, cell culture supernatant, and others.

A “tumor sample” is a biological sample containing tumor cells, whetherintact or degraded. The sample may be of any biological tissue or fluid.Such samples include, but are not limited to, sputum, blood, serum,plasma, blood cells (e.g., white cells), tissue, core or fine needlebiopsy samples, cell-containing body fluids, urine, peritoneal fluid,and pleural fluid, liquor cerebrospinalis, tear fluid, or cells isolatedtherefrom. This may also include sections of tissues such as frozen orfixed sections taken for histological purposes or microdissected cellsor extracellular parts thereof. A tumor sample to be analyzed can betissue material from a neoplastic lesion taken by aspiration orpunctuation, excision or by any other surgical method leading to biopsyor resected cellular material. Such comprises tumor cells or tumor cellfragments obtained from the patient. The cells may be found in a cell“smear” collected, for example, by a nipple aspiration, ductal lavage,fine needle biopsy or from provoked or spontaneous nipple discharge. Inanother embodiment, the sample is a body fluid. Such fluids include, forexample, blood fluids, serum, plasma, lymph, ascitic fluids, gynecologicfluids, or urine but not limited to these fluids.

A “gene” is a set of segments of nucleic acid that contains theinformation necessary to produce a functional RNA product. A “geneproduct” is a biological molecule produced through transcription orexpression of a gene, e.g. an mRNA, cDNA or the translated protein.

An “mRNA” is the transcribed product of a gene and shall have theordinary meaning understood by a person skilled in the art. A “moleculederived from an mRNA” is a molecule which is chemically or enzymaticallyobtained from an mRNA template, such as cDNA.

The term “expression level” refers to a determined level of geneexpression. This may be a determined level of gene expression as anabsolute value or compared to a reference gene (e.g. a housekeepinggene), to the average of two or more reference genes, or to a computedaverage expression value (e.g. in DNA chip analysis) or to anotherinformative gene without the use of a reference sample. The expressionlevel of a gene may be measured directly, e.g. by obtaining a signalwherein the signal strength is correlated to the amount of mRNAtranscripts of that gene or it may be obtained indirectly at a proteinlevel, e.g. by immunohistochemistry, CISH, ELISA or RIA methods. Theexpression level may also be obtained by way of a competitive reactionto a reference sample. An expression value which is determined bymeasuring some physical parameter in an assay, e.g. fluorescenceemission, may be assigned a numerical value which may be used forfurther processing of information.

A “reference pattern of expression levels”, within the meaning of theinvention shall be understood as being any pattern of expression levelsthat can be used for the comparison to another pattern of expressionlevels. In a preferred embodiment of the invention, a reference patternof expression levels is, e.g., an average pattern of expression levelsobserved in a group of healthy individuals, diseased individuals, ordiseased individuals having received a particular type of therapy,serving as a reference group, or individuals with good or bad outcome.

The term “mathematically combining expression levels”, within themeaning of the invention shall be understood as deriving a numeric valuefrom a determined expression level of a gene and applying an algorithmto one or more of such numeric values to obtain a combined numericalvalue or combined score.

An “algorithm” is a process that performs some sequence of operations toproduce information.

A “score” is a numeric value that was derived by mathematicallycombining expression levels using an algorithm. It may also be derivedfrom expression levels and other information, e.g. clinical data. Ascore may be related to the outcome of a patient's disease.

A “discriminant function” is a function of a set of variables used toclassify an object or event. A discriminant function thus allowsclassification of a patient, sample or event into a category or aplurality of categories according to data or parameters available fromsaid patient, sample or event. Such classification is a standardinstrument of statistical analysis well known to the skilled person.E.g. a patient may be classified as “high risk” or “low risk”, “highprobability of metastasis” or “low probability of metastasis”, “in needof treatment” or “not in need of treatment” according to data obtainedfrom said patient, sample or event. Classification is not limited to“high vs. low”, but may be performed into a plurality of categories,grading or the like. Classification shall also be understood in a widersense as a discriminating score, where e.g. a higher score represents ahigher likelihood of distant metastasis, e.g. the (overall) risk of adistant metastasis. Examples for discriminant functions which allow aclassification include, but are not limited to functions defined bysupport vector machines (SVM), k-nearest neighbors (kNN), (naive) Bayesmodels, linear regression models or piecewise defined functions such as,for example, in subgroup discovery, in decision trees, in logicalanalysis of data (LAD) and the like. In a wider sense, continuous scorevalues of mathematical methods or algorithms, such as correlationcoefficients, projections, support vector machine scores, othersimilarity-based methods, combinations of these and the like areexamples for illustrative purpose.

The term “therapy modality”, “therapy mode”, “regimen” as well as“therapy regimen” refers to a timely sequential or simultaneousadministration of anti-tumor, and/or anti vascular, and/or immunestimulating, and/or blood cell proliferative agents, and/or radiationtherapy, and/or hyperthermia, and/or hypothermia for cancer therapy. Theadministration of these can be performed in an adjuvant and/orneoadjuvant mode. The composition of such “protocol” may vary in thedose of the single agent, timeframe of application and frequency ofadministration within a defined therapy window. Currently variouscombinations of various drugs and/or physical methods, and variousschedules are under investigation.

The term “cytotoxic chemotherapy” refers to various treatment modalitiesaffecting cell proliferation and/or survival. The treatment may includeadministration of alkylating agents, antimetabolites, anthracyclines,plant alkaloids, topoisomerase inhibitors, and other antitumor agents,including monoclonal antibodies and kinase inhibitors. In particular,the cytotoxic treatment may relate to a taxane treatment. Taxanes areplant alkaloids which block cell division by preventing microtubulefunction. The prototype taxane is the natural product paclitaxel,originally known as Taxol and first derived from the bark of the PacificYew tree. Docetaxel is a semi-synthetic analogue of paclitaxel. Taxanesenhance stability of microtubules, preventing the separation ofchromosomes during anaphase.

The term “endocrine treatment” or “hormonal treatment” (sometimes alsoreferred to as “anti-hormonal treatment”) denotes a treatment whichtargets hormone signaling, e.g. hormone inhibition, hormone receptorinhibition, use of hormone receptor agonists or antagonists, use ofscavenger- or orphan receptors, use of hormone derivatives andinterference with hormone production. Particular examples are tamoxifenetherapy which modulates signaling of the estrogen receptor, or aromatasetreatment which interferes with steroid hormone production.

Tamoxifen is an orally active selective estrogen receptor modulator(SERM) that is used in the treatment of breast cancer and is currentlythe world's largest selling drug for that purpose. Tamoxifen is soldunder the trade names Nolvadex, Istubal, and Valodex. However, the drug,even before its patent expiration, was and still is widely referred toby its generic name “tamoxifen.” Tamoxifen and Tamoxifen derivativescompetitively bind to estrogen receptors on tumors and other tissuetargets, producing a nuclear complex that decreases RNA synthesis andinhibits estrogen effects.

Steroid receptors are intracellular receptors (typically cytoplasmic)that perform signal transduction for steroid hormones. Examples includetype I Receptors, in particular sex hormone receptors, e.g. androgenreceptor, estrogen receptor, progesterone receptor; Glucocorticoidreceptor, mineralocorticoid receptor; and type II Receptors, e.g.vitamin A receptor, vitamin D receptor, retinoid receptor, thyroidhormone receptor.

The term “hybridization-based method”, as used herein, refers to methodsimparting a process of combining complementary, single-stranded nucleicacids or nucleotide analogues into a single double stranded molecule.Nucleotides or nucleotide analogues will bind to their complement undernormal conditions, so two perfectly complementary strands will bind toeach other readily. In bioanalytics, very often labeled, single strandedprobes are used in order to find complementary target sequences. If suchsequences exist in the sample, the probes will hybridize to saidsequences which can then be detected due to the label. Otherhybridization based methods comprise microarray and/or biochip methods.Therein, probes are immobilized on a solid phase, which is then exposedto a sample. If complementary nucleic acids exist in the sample, thesewill hybridize to the probes and can thus be detected. These approachesare also known as “array based methods”. Yet another hybridization basedmethod is PCR, which is described below. When it comes to thedetermination of expression levels, hybridization based methods may forexample be used to determine the amount of mRNA for a given gene.

An oligonucleotide capable of specifically binding sequences a gene orfragments thereof relates to an oligonucleotide which specificallyhybridizes to a gene or gene product, such as the gene's mRNA or cDNA orto a fragment thereof. To specifically detect the gene or gene product,it is not necessary to detect the entire gene sequence. A fragment ofabout 20-150 bases will contain enough sequence specific information toallow specific hybridization.

The term “a PCR based method” as used herein refers to methodscomprising a polymerase chain reaction (PCR). This is a method ofexponentially amplifying nucleic acids, e.g. DNA by enzymaticreplication in vitro. As PCR is an in vitro technique, it can beperformed without restrictions on the form of DNA, and it can beextensively modified to perform a wide array of genetic manipulations.When it comes to the determination of expression levels, a PCR basedmethod may for example be used to detect the presence of a given mRNA by(1) reverse transcription of the complete mRNA pool (the so calledtranscriptome) into cDNA with help of a reverse transcriptase enzyme,and (2) detecting the presence of a given cDNA with help of respectiveprimers. This approach is commonly known as reverse transcriptase PCR(rtPCR).

Moreover, PCR-based methods comprise e.g. real time PCR, and,particularly suited for the analysis of expression levels, kinetic orquantitative PCR (qPCR).

The term “Quantitative PCR” (qPCR)” refers to any type of a PCR methodwhich allows the quantification of the template in a sample.Quantitative real-time PCR comprise different techniques of performanceor product detection as for example the TaqMan technique or theLightCycler technique. The TaqMan technique, for examples, uses adual-labelled fluorogenic probe. The TaqMan real-time PCR measuresaccumulation of a product via the fluorophore during the exponentialstages of the PCR, rather than at the end point as in conventional PCR.The exponential increase of the product is used to determine thethreshold cycle, CT, i.e. the number of PCR cycles at which asignificant exponential increase in fluorescence is detected, and whichis directly correlated with the number of copies of DNA template presentin the reaction. The set up of the reaction is very similar to aconventional PCR, but is carried out in a real-time thermal cycler thatallows measurement of fluorescent molecules in the PCR tubes. Differentfrom regular PCR, in TaqMan real-time PCR a probe is added to thereaction, i.e., a single-stranded oligonucleotide complementary to asegment of 20-60 nucleotides within the DNA template and located betweenthe two primers. A fluorescent reporter or fluorophore (e.g.,6-carboxyfluorescein, acronym: FAM, or tetrachlorofluorescin, acronym:TET) and quencher (e.g., tetramethylrhodamine, acronym: TAMRA, ofdihydrocyclopyrroloindole tripeptide ‘black hole quencher’, acronym:BHQ) are covalently attached to the 5′ and 3′ ends of the probe,respectively[2]. The close proximity between fluorophore and quencherattached to the probe inhibits fluorescence from the fluorophore. DuringPCR, as DNA synthesis commences, the 5′ to 3′ exonuclease activity ofthe Taq polymerase degrades that proportion of the probe that hasannealed to the template. Degradation of the probe releases thefluorophore from it and breaks the close proximity to the quencher, thusrelieving the quenching effect and allowing fluorescence of thefluorophore. Hence, fluorescence detected in the real-time PCR thermalcycler is directly proportional to the fluorophore released and theamount of DNA template present in the PCR.

By “array” or “matrix” an arrangement of addressable locations or“addresses” on a device is meant. The locations can be arranged in twodimensional arrays, three dimensional arrays, or other matrix formats.The number of locations can range from several to at least hundreds ofthousands. Most importantly, each location represents a totallyindependent reaction site. Arrays include but are not limited to nucleicacid arrays, protein arrays and antibody arrays. A “nucleic acid array”refers to an array containing nucleic acid probes, such asoligonucleotides, nucleotide analogues, polynucleotides, polymers ofnucleotide analogues, morpholinos or larger portions of genes. Thenucleic acid and/or analogue on the array is preferably single stranded.Arrays wherein the probes are oligonucleotides are referred to as“oligonucleotide arrays” or “oligonucleotide chips.” A “microarray,”herein also refers to a “biochip” or “biological chip”, an array ofregions having a density of discrete regions of at least about 100/cm2,and preferably at least about 1000/cm2.

“Primer pairs” and “probes”, within the meaning of the invention, shallhave the ordinary meaning of this term which is well known to the personskilled in the art of molecular biology. In a preferred embodiment ofthe invention “primer pairs” and “probes”, shall be understood as beingpolynucleotide molecules having a sequence identical, complementary,homologous, or homologous to the complement of regions of a targetpolynucleotide which is to be detected or quantified. In yet anotherembodiment, nucleotide analogues are also comprised for usage as primersand/or probes. Probe technologies used for kinetic or real time PCRapplications could be e.g. TaqMan® systems obtainable at AppliedBiosystems, extension probes such as Scorpion® Primers, DualHybridisation Probes, Amplifluor® obtainable at Chemicon International,Inc, or Minor Groove Binders.

“Individually labeled probes”, within the meaning of the invention,shall be understood as being molecular probes comprising apolynucleotide, oligonucleotide or nucleotide analogue and a label,helpful in the detection or quantification of the probe. Preferredlabels are fluorescent molecules, luminescent molecules, radioactivemolecules, enzymatic molecules and/or quenching molecules.

“Arrayed probes”, within the meaning of the invention, shall beunderstood as being a collection of immobilized probes, preferably in anorderly arrangement. In a preferred embodiment of the invention, theindividual “arrayed probes” can be identified by their respectiveposition on the solid support, e.g., on a “chip”.

When used in reference to a single-stranded nucleic acid sequence, theterm “substantially homologous” refers to any probe that can hybridize(i.e., it is the complement of) the single-stranded nucleic acidsequence under conditions of low stringency as described above.

SUMMARY OF THE INVENTION

In general terms, the present invention provides a method to assess therisk of recurrence of a node negative or positive, estrogen receptorpositive and HER2/NEU negative breast cancer patient, in particularpatients receiving endocrine therapy, for example when treated withtamoxifen. Estrogen receptor status is generally determined usingimmunohistochemistry, HER2/NEU (ERBB2) status is generally determinedusing immunohistochemistry and fluorescence in situ hybridization.However, estrogen receptor status and HER2/NEU (ERBB2) status may, forthe purposes of the invention, be determined by any suitable method,e.g. immunohistochemistry, fluorescence in situ hybridization (FISH), orRNA expression analysis.

The present invention relates to a method for predicting an outcome ofbreast cancer in an estrogen receptor positive and HER2 negative tumorof a breast cancer patient, said method comprising:

(a) determining in a tumor sample from said patient the RNA expressionlevels of at least 2 of the following 9 genes: UBE2C, BIRC5, RACGAP1,DHCR7, STC2, AZGP1, RBBP8, IL6ST, and MGP

(b) mathematically combining expression level values for the genes ofthe said set which values were determined in the tumor sample to yield acombined score, wherein said combined score is indicative of a prognosisof said patient. In one embodiment at least 3, 4, 5 or 6 genes areselected.

In a further embodiment of the invention the method comprises:

(a) determining in a tumor sample from said patient the RNA expressionlevels of the following 8 genes: UBE2C, RACGAP1, DHCR7, STC2, AZGP1,RBBP8, IL6ST, and MGP

(b) mathematically combining expression level values for the genes ofthe said set which values were determined in the tumor sample to yield acombined score, wherein said combined score is indicative of a prognosisof said patient.

In a further embodiment the method of the invention comprises:

(a) determining in a tumor sample from said patient the RNA expressionlevels of the following 8 genes: UBE2C, BIRC5, DHCR7, STC2, AZGP1,RBBP8, IL6ST, and MGP;

(b) mathematically combining expression level values for the genes ofthe said set which values were determined in the tumor sample to yield acombined score, wherein said combined score is indicative of a prognosisof said patient.

In yet another embodiment of the invention

BIRC5 may be replaced by UBE2C or TOP2A or RACGAP1 or AURKA or NEK2 orE2F8 or PCNA or CYBRD1 or DCN or ADRA2A or SQLE or CXCL12 or EPHX2 orASPH or PRSS16 or EGFR or CCND1 or TRIM29 or DHCR7 or PIP or TFAP2B orWNT5A or APOD or PTPRT with the proviso that after a replacement 8different genes are selected; and

UBE2C may be replaced by BIRC5 or RACGAP1 or TOP2A or AURKA or NEK2 orE2F8 or PCNA or CYBRD1 or ADRA2A or DCN or SQLE or CCND1 or ASPH orCXCL12 or PIP or PRSS16 or EGFR or DHCR7 or EPHX2 or TRIM29 with theproviso that after a replacement 8 different genes are selected; and

DHCR7 may be replaced by AURKA, BIRC5, UBE2C or by any other gene thatmay replace BIRC5 or UBE2C with the proviso that after a replacement 8different genes are selected; and

STC2 may be replaced by INPP4B or IL6ST or SEC14L2 or MAPT or CHPT1 orABAT or SCUBE2 or ESR1 or RBBP8 or PGR or PTPRT or HSPA2 or PTGER3 withthe proviso that after a replacement 8 different genes are selected; and

AZGP1 may be replaced by PIP or EPHX2 or PLAT or SEC14L2 or SCUBE2 orPGR with the proviso that after a replacement 8 different genes areselected; and

RBBP8 may be replaced by CELSR2 or PGR or STC2 or ABAT or IL6ST with theproviso that after a replacement 8 different genes are selected; and

IL6ST may be replaced by INPP4B or STC2 or MAPT or SCUBE2 or ABAT or PGRor SEC14L2 or ESR1 or GJA1 or MGP or EPHX2 or RBBP8 or PTPRT or PLATwith the proviso that after a replacement 8 different genes areselected; and

MGP may be replaced by APOD or IL6ST or EGFR with the proviso that aftera replacement 8 different genes are selected.

According to an aspect of the invention there is provided a method asdescribed above, wherein said combined score is indicative of benefitfrom cytotoxic chemotherapy.

Using the method of the invention before a patient receives endocrinetherapy allows a prediction of the efficacy of endocrine therapy.

Table 2 below shows whether the overexpression of each of the abovemarker genes is indicative of a good outcome or a bad outcome in apatient receiving endocrine therapy. The skilled person can thusconstruct a mathematical combination i.e. an algorithm taking intoaccount the effect of a given genes. For example a summation or weightedsummation of genes whose overexpression is indicative of a good outcomeresults in an algorithm wherein a high risk score is indicative of agood outcome. The validity of the algorithm may be examined by analyzingtumor samples of patients with a clinical record, wherein e.g. the scorefor good outcome patients and bad outcome patients may be determinedseparately and compared. The skilled person, a biostatistician, willknow to apply further mathematical methods, such as discriminatefunctions to obtain optimized algorithms. Algorithms may be optimizede.g. for sensitivity or specificity. Algorithms may be adapted to theparticular analytical platform used to measure gene expression of markergenes, such as quantitiative PCR.

According to an aspect of the invention there is provided a method asdescribed above, wherein said endocrine therapy comprises tamoxifen oran aromatase inhibitor.

According to an aspect of the invention there is provided a method asdescribed above, wherein a risk of developing recurrence is predicted.

According to an aspect of the invention there is provided a method asdescribed above, wherein said expression level is determined as anon-protein expression level.

According to an aspect of the invention there is provided a method asdescribed above, wherein said expression level is determined as an RNAexpression level.

According to an aspect of the invention there is provided a method asdescribed above, wherein said expression level is determined by at leastone of

-   -   a PCR based method,    -   a microarray based method, and    -   a hybridization based method.

According to an aspect of the invention there is provided a method asdescribed above, wherein said determination of expression levels is in aformalin-fixed paraffin embedded tumor sample or in a fresh-frozen tumorsample.

According to an aspect of the invention there is provided a method asdescribed above, wherein the expression level of said at least on markergene is determined as a pattern of expression relative to at least onereference gene or to a computed average expression value.

According to an aspect of the invention there is provided a method asdescribed above, wherein said step of mathematically combining comprisesa step of applying an algorithm to values representative of anexpression level of a given gene.

According to an aspect of the invention there is provided a method asdescribed above, wherein said algorithm is a linear combination of saidvalues representative of an expression level of a given gene.

According to an aspect of the invention there is provided a method asdescribed above, wherein a value for a representative of an expressionlevel of a given gene is multiplied with a coefficient.

According to an aspect of the invention there is provided a method asdescribed above, wherein one, two or more thresholds are determined forsaid combined score and discriminated into high and low risk, high,intermediate and low risk, or more risk groups by applying the thresholdon the combined score.

According to an aspect of the invention there is provided a method asdescribed above, wherein a high combined score is indicative of benefitfrom a more aggressive therapy, e.g. cytotoxic chemotherapy. The skilledperson understands that a “high score” in this regard relates to areference value or cutoff value. The skilled person further understandsthat depending on the particular algorithm used to obtain the combinedscore, also a “low” score below a cut off or reference value can beindicative of benefit from a more aggressive therapy, e.g. cytotoxicchemotherapy. This is the case when genes having a positive correlationwith high risk of metastasis factor into the algorithm with a positivecoefficient, such that an overall high score indicates high expressionof genes having a positive correlation with high risk.

According to an aspect of the invention there is provided a method asdescribed above, wherein an information regarding nodal status of thepatient is processed in the step of mathematically combining expressionlevel values for the genes to yield a combined score.

According to an aspect of the invention there is provided a method asdescribed above, wherein said information regarding nodal status is anumerical value≦0 if said nodal status is negative and said informationis a numerical value>0 if said nodal status positive or unknown. Inexemplary embodiments of the invention a negative nodal status isassigned the value 0, an unknown nodal status is assigned the value 0.5and a positive nodal status is assigned the value 1. Other values may bechosen to reflect a different weighting of the nodal status within analgorithm.

The invention further relates to a kit for performing a method asdescribed above, said kit comprising a set of oligonucleotides capableof specifically binding sequences or to seqences of fragments of thegenes in a combination of genes, wherein

(i) said combination comprises at least the 8 genes UBE2C, BIRC5, DHCR7,STC2, AZGP1, RBBP8, IL6ST, and MGP; or

(ii) said combination comprises at least the 10 genes BIRC5, AURKA,PVALB, NMU, STC2, RBBP8, PTGER3, CXCL12, CDH1, and PIP; or

(iii) said combination comprises at least the 9 genes BIRC5, DHCR7,RACGAP1, PVALB, STC2, IL6ST, PTGER3, CXCL12, and ABAT; or

(iv) said combination comprises at least the 9 genes DHCR7, RACGAP1,NMU, AZGP1, RBBP8, IL6ST, and MGP;

The invention further relates to the use of a kit for performing amethod of any of claims 1 to 17, said kit comprising a set ofoligonucleotides capable of specifically binding sequences or tosequences of fragments of the genes in a combination of genes, wherein

(i) said combination comprises at least the 8 genes UBE2C, BIRC5, DHCR7,STC2, AZGP1, RBBP8, IL6ST, and MGP; or

(ii) said combination comprises at least the 10 genes BIRC5, AURKA,PVALB, NMU, STC2, RBBP8, PTGER3, CXCL12, CDH1, and PIP; or

(iii) said combination comprises at least the 9 genes BIRC5, DHCR7,RACGAP1, PVALB, STC2, IL6ST, PTGER3, CXCL12, and ABAT; or

(iv) said combination comprises at least the 9 genes DHCR7, RACGAP1,NMU, AZGP1, RBBP8, IL6ST, and MGP;19. A computer program product capableof processing values representative of an expression level of the genesAKR1C3, MAP4 and SPP1 by mathematically combining said values to yield acombined score, wherein said combined score is indicative of benefitfrom cytotoxic chemotherapy of said patient.

The invention further relates to a computer program product capable ofprocessing values representative of an expression level of a combinationof genes mathematically combining said values to yield a combined score,wherein said combined score is indicative of efficacy or benefit fromendocrine therapy of said patient, according to the above methods.

Said computer program product may be stored on a data carrier orimplemented on a diagnostic system capable of outputting valuesrepresentative of an expression level of a given gene, such as a realtime PCR system.

If the computer program product is stored on a data carrier or runningon a computer, operating personal can input the expression valuesobtained for the expression level of the respective genes. The computerprogram product can then apply an algorithm to produce a combined scoreindicative of benefit from cytotoxic chemotherapy for a given patient.

The methods of the present invention have the advantage of providing areliable prediction of an outcome of disease based on the use of only asmall number of genes. The methods of the present invention have beenfound to be especially suited for analyzing the response to endocrinetreatment, e.g. by tamoxifen, of patients with tumors classified as ESR1positive and ERBB2 negative.

DETAILED DESCRIPTION OF THE INVENTION

The invention is explained in conjunction with exemplary embodiments andthe attached figures:

FIG. 1 shows a Forrest Plot of the adjusted hazard unit ratio with 95%confidence intervall of the T5 score in the combined cohort, as well asthe individual treatment arms of the ABCSG06 and 08 studies, usingdistant metastasis as endpoint.

FIG. 2 shows a Kaplan Meier Analysis of ER+, HER−, N0-3 patients fromthe combined ABCSG06 and 08 cohorts, stratified as high or low riskaccording to T5 Score value.

Herein disclosed are unique combinations of marker genes which can becombined into an algorithm for the here presented new predictive test.Technically, the method of the invention can be practiced using twotechnologies: 1.) Isolation of total RNA from fresh or fixed tumortissue and 2.) Kinetic RT-PCR of the isolated nucleic acids.Alternatively, it is contemplated to measure expression levels usingalternative technologies, e.g by microarray or by measurement at aprotein level.

The methods of the invention are based on quantitative determination ofRNA species isolated from the tumor in order to obtain expression valuesand subsequent bioinformatic analysis of said determined expressionvalues. RNA species might be isolated from any type of tumor sample,e.g. biopsy samples, smear samples, resected tumor material, freshfrozen tumor tissue or from paraffin embedded and formalin fixed tumortissue. First, RNA levels of genes coding for specific combinations ofthe genes UBE2C, BIRC5, DHCR7, RACGAP1, AURKA, PVALB, NMU, STC2, AZGP1,RBBP8, IL6ST, MGP, PTGER3, CXCL12, ABAT, CDH1, and PIP or specificcombinations thereof, as indicated, are determined. Based on theseexpression values a prognostic score is calculated by a mathematicalcombination, e.g. according to formulas T5 T1, T4, or T5b (see below). Ahigh score value indicates a high risk for development of distantmetastasis, a low score value indicates a low risk of distantmetastasis. Consequently, a high score also indicates that the patientis a high risk patient who will benefit from a more aggressive therapy,e.g. cytotoxic chemotherapy.

The present examples are based on identification of prognostic genesusing tumors of patients homogeneously treated in the adjuvant settingwith tamoxifen. Furthermore, identification of relevant genes has beenrestricted to tumors classified as ESR1 positive and ERBB2 negativebased on RNA expression levels. In addition, genes allowing separationof intermediate risk, e.g. grade 2 tumors were considered for algorithmdevelopment. Finally, a platform transfer from Affymetrix HG_U133aarrays to quantitative real time PCR, as well as a sample type transferfrom fresh frozen tissue to FFPE tissue was performed to ensure robustalgorithm performance, independent from platform and tissue type. As aresult, determination of the expression level of RNA species from theprimary tumor and the subsequent complex and multivariate analysis asdescribed above provides a superior method for prediction of thelikelihood of disease recurrence in patients diagnosed with lymph nodenegative or positive early breast cancer, when treated with tamoxifenonly in the adjuvant setting. Thus the test relies on fewer genes thanthose of the competitors but provides superior information regardinghigh sensitivity and negative predictive value, in particular for tumorsconsidered to exhibit an intermediate risk of recurrence based onstandard clinical factors.

The total RNA was extracted with a Siemens, silica bead-based and fullyautomated isolation method for RNA from one 10 μm whole FFPE tissuesection on a Hamilton MICROLAB STARlet liquid handling robot (17). Therobot, buffers and chemicals were part of a Siemens VERSANT® kPCRMolecular System (Siemens Healthcare Diagnostics, Tarrytown, N.Y.; notcommercially available in the USA). Briefly, 150 μl FFPE buffer (BufferFFPE, research reagent, Siemens Healthcare Diagnostics) were added toeach section and incubated for 30 minutes at 80° C. with shaking to meltthe paraffin. After cooling down, proteinase K was added and incubatedfor 30 minutes at 65° C. After lysis, residual tissue debris was removedfrom the lysis fluid by a 15 minutes incubation step at 65° C. with 40μl silica-coated iron oxide beads. The beads with surface-bound tissuedebris were separated with a magnet and the lysates were transferred toa standard 2 ml deep well-plate (96 wells). There, the total RNA and DNAwas bound to 40 μl unused beads and incubated at room temperature.Chaotropic conditions were produced by the addition of 600 μl lysisbuffer. Then, the beads were magnetically separated and the supernatantswere discarded. Afterwards, the surface-bound nucleic acids were washedthree times followed by magnetization, aspiration and disposal ofsupernatants. Afterwards, the nucleic acids were eluted by incubation ofthe beads with 100 μl elution buffer for 10 minutes at 70° C. withshaking. Finally, the beads were separated and the supernatant incubatedwith 12 μl DNase I Mix (2 μL DNase I (RNase free); 10 μl 10× DNase Ibuffer; Ambion/Applied Biosystems, Darmstadt, Germany) to removecontaminating DNA. After incubation for 30 minutes at 37° C., theDNA-free total RNA solution was aliquoted and stored at −80° C. ordirectly used for mRNA expression analysis by reverse transcriptionkinetic PCR (RTkPCR). All the samples were analyzed with one-stepRT-kPCR for the gene expression of up to three reference genes, (RPL37A,CALM2, OAZ1) and up to 16 target genes in an ABI PRISM® 7900HT (AppliedBiosystems, Darmstadt, Germany). The SuperScript® III Platinum® One-StepQuantitative RT-PCR System with ROX (6-carboxy-X-rhodamine) (Invitrogen,Karlsruhe, Germany) was used according to the manufacturer'sinstructions. Respective probes and primers are shown in table 1. ThePCR conditions were as follows: 30 minutes at 50° C., 2 minutes at 95°C. followed by 40 cycles of 15 seconds at 95° C. and 30 seconds at 60°C. All the PCR assays were performed in triplicate. As surrogate markerfor RNA yield, the housekeeper gene, RPL37A cycle threshold (Ct) valuewas used as described elsewhere (17). The relative gene expressionlevels of the target genes were calculated by the delta-Ct method usingthe formula:

20−(Ct(target)−mean(Ct(reference genes))).

A platform transfer from Affymetrix HG_U133a arrays (fresh frozentissue) to quantitative real time PCR (FFPE tissue) was calculated asfollows. Material from 158 patients was measured using both platforms toyield paired samples. Delta-Ct values were calculated from the PCR data.Log 2-Expressions were calculated from the Affymetrix data by applying alower bound (setting all values below the lower bound to the lowerbound) and then calculating the logarithm of base 2. The application ofa lower bound reduces the effect of increased relative measurement noisefor low expressed genes/samples; a lower bound of 20 was used, lowerbounds between 0.1 and 200 also work well. A HG_U133a probe set wasselected for each PCR-measured gene by maximizing the Pearsoncorrelation coefficient between the delta-Ct value (from PCR) and thelog 2-expression (from Affymetrix). Other correlation measures will alsowork well, e.g. the Spearman correlation coefficient. In most cases thebest-correlating probe set belonged to the intended gene, for theremaining cases the PCR-gene was removed for further processing. Thosegenes showing a bad correlation between platforms were also removed,where a threshold on the Pearson correlation coefficient of 0.7 was used(values of between 0.5 and 0.8) also work well. The platformtransformation was finalized by calculating unsupervisedz-transformations for both platforms and combining them; a singlePCR-delta-Ct value then is transformed to the Affymetrix scale by thefollowing steps: (i) apply affine linear transformation wherecoefficients were determined by z-transformation of PCR data, (ii) applyinverse affine linear transformation where coefficients were determinedby z-transformation of Affymetrix data, (iii) invert log 2, i.e.calculate exponential with respect to base 2. Alternatives to thetwo-fold z-transformations are linear or higher order regression, robustregression or principal component based methods, which will also workwell.

The sequences of the primers and probes were as follows:

TABLE 1 Primer and probe sequences for the respective genes: Seq Seqgene probe ID forward primer ID ABAT TCGCCCTAAGAGGCTCTTCCTC 1GGCAACTTGAGGTCTGACTTTTG 2 ADRA2A TTGTCCTTTCCCCCCTCCGTGC 4CCCCAAGAGCTGTTAGGTATCAA 5 APOD CATCAGCTCTCAACTCCTGGTTTAACA 7ACTCACTAATGGAAAACGGAAAGATC 8 ASPH TGGGAGGAAGGCAAGGTGCTCATC 10TGTGCCAACGAGACCAAGAC 11 AURKA CCGTCAGCCTGTGCTAGGCAT 13AATCTGGAGGCAAGGTTCGA 14 BIRC5 AGCCAGATGACGACCCCATAGAGGAACA 16CCCAGTGTTTCTTCTGCTTCAAG 17 CELSR2 ACTGACTTTCCTTCTGGAGCAGGTGGC 19TCCAAGCATGTATTCCAGACTTGT 20 CHPT1 CCACGGCCACCGAAGAGGCAC 22CGCTCGTGCTCATCTCCTACT 23 CXCL12 CCACAGCAGGGTTTCAGGTTCC 25GCCACTACCCCCTCCTGAA 26 CYBRD1 AGGGCATCGCCATCATCGTC 28GTCACCGGCTTCGTCTTCA 29 DCN TCTTTTCAGCAACCCGGTCCA 31AAGGCTTCTTATTCGGGTGTGA 32 DHCR7 TGAGCGCCCACCCTCTCGA 34GGGCTCTGCTTCCCGATT 35 E2F8 CAGGATACCTAATCCCTCTCACGCAG 37AAATGTCTCCGCAACCTTGTTC 38 EPHX2 TGAAGCGGGAGGACTTTTTGTAAA 40CGATGAGAGTGTTTTATCCATGCA 41 ESR1 ATGCCCTTTTGCCGATGCA 43GCCAAATTGTGTTTGATGGATTAA 44 GJA1 TGCACAGCCTTTTGATTTCCCCGAT 46CGGGAAGCACCATCTCTAACTC 47 HSPA2 CAAGTCAGCAAACACGCAAAA 49CATGCACGAACTAATCAAAAATGC 50 IL6ST CAAGCTCCACCTTCCAAAGGACCT 52CCCTGAATCCATAAAGGCATACC 53 INPP4B TCCGAGCGCTGGATTGCATGAG 55GCACCAGTTACACAAGGACTTCTTT 56 MAPT AGACTATTTGCACACTGCCGCCT 58GTGGCTCAAAGGATAATATCAAACAC 59 MGP CCTTCATATCCCCTCAGCAGAGATGG 61CCTTCATTAACAGGAGAAATGCAA 62 NEK2 TCCTGAACAAATGAATCGCATGTCCTACAA 64ATTTGTTGGCACACCTTATTACATGT 65 PCNA AAATACTAAAATGCGCCGGCAATGA 67GGGCGTGAACCTCACCAGTA 68 PGR TTGATAGAAACGCTGTGAGCTCGA 70AGCTCATCAAGGCAATTGGTTT 71 PIP TGCATGGTGGTTAAAACTTACCTCA 73TGCTTGCAGTTCAAACAGAATTG 74 PLAT CAGAAAGTGGCCATGCCACCCTG 76TGGGAAGACATGAATGCACACTA 77 PRSS16 CACTGCCGGTCACCCACACCA 79CTGAGGAGCACAGAACCTCAACT 80 PTGER3 TCGGTCTGCTGGTCTCCGCTCC 82CTGATTGAAGATCATTTTCAACATCA 83 PTPRT TTGGCTTCTGGACACCCTCACA 85GAGTTGTGGCCTCTACCATTGC 86 RACGAP1 ACTGAGAATCTCCACCCGGCGCA 88TCGCCAACTGGATAAATTGGA 89 RBBP8 ACCGATTCCGCTACATTCCACCCAAC 91AGAAATTGGCTTCCTGCTCAAG 92 SCUBE2 CTAGAGGGTTCCAGGTCCCATACGTGACATA 94TGTGGATTCAGTTCAAGTCCAATG 95 SEC14L2 TGGGAGGCATGCAACGCGTG 97AGGTCTTACTAAGCAGTCCCATCTCT 98 SQLE TATGCGTCTCCCAAAAGAAGAACACCTCG 100GCAAGCTTCCTTCCTCCTTCA 101 TFAP2B CAACACCACCACTAACAGGCACACGTC 103GGCATGGACAAGATGTTCTTGA 104 TOP2A CAGATCAGGACCAAGATGGTTCCCACAT 106CATTGAAGACGCTTCGTTATGG 107 TRIM29 TGCTGTCTCACTACCGGCCATTCTACG 109TGGAAATCTGGCAAGCAGACT 110 UBE2C TGAACACACATGCTGCCGAGCTCTG 112CTTCTAGGAGAACCCAACATTGATAGT 113 WNT5A TATTCACATCCCCTCAGTTGCAGTGAATTG 115CTGTGGCTCTTAATTTATTGCATAATG 116 STC2 TCTCACCTTGACCCTCAGCCAAG 118ACATTTGACAAATTTCCCTTAGGATT 119 AZGP1 CACCAGCCACCAGGCCCCAG 121TCCTGGACCGGCAAGATC 122 CALM2 TCGCGTCTCGGAAACCGGTAGC 124GAGCGAGCTGAGTGGTTGTG 125 CDH1 CCTGCCAATCCCGATGAAATTGGAAAT 127TGAGTGTCCCCCGGTATCTTC 128 NMU ACCCTGCTGACCTTCTTCCATTCCGT 130AGAAATTGGCTTCCTGCTCAAG 131 OAZ1 TGCTTCCACAAGAACCGCGAGGA 133CGAGCCGACCATGTCTTCAT 134 PVALB AAGTTCTTCCAAATGGTCGGCC 136CCGACTCCTTCGACCACAA 137 RPL37A TGGCTGGCGGTGCCTGGA 139TGTGGTTCCTGCATGAAGACA 140 Seq gene reverse primer ID ABATGGTCAGCTCACAAGTGGTGTGA 3 ADRA2A TCAATGACATGATCTCAACCAGAA 6 APODTCACCTTCGATTTGATTCACAGTT 9 ASPH TCGTGCTCAAAGGAGTCATCA 12 AURKATCTGGATTTGCCTCCTGTGAA 15 BIRC5 CAACCGGACGAATGCTTTTT 18 CELSR2TGCCCACAGCCTCTTTTTCT 21 CHPT1 CCCAGTGCACATAAAAGGTATGTC 24 CXCL12TCACCTTGCCAACAGTTCTGAT 27 CYBRD1 CAGGTCCACGGCAGTCTGT 30 DCNTGGATGGCTGTATCTCCCAGTA 33 DHCR7 AGTCATAGGGCAAGCAGAAAATTC 36 E2F8CTGCCCCCAGGGATGAG 39 EPHX2 GCTGAGGCTGGGCTCTTCT 42 ESR1GACAAAACCGAGTCACATCAGTAATAG 45 GJA1 TTCATGTCCAGCAGCTAGTTTTTT 48 HSPA2ACATTATTCGAGGTTTCTCTTTAATGC 51 IL6ST CAGCTTCGTTTTTCCCTACTTTTT 54 INPP4BTCTCTATGCGGCATCCTTCTC 57 MAPT ACCTTGCTCAGGTCAACTGGTT 60 MGPATTGAGCTCGTGGACAGGCTTA 63 NEK2 AAGCAGCCCAATGACCAGATa 66 PCNACTTCGGCCCTTAGTGTAATGATATC 69 PGR ACAAGATCATGCAAGTTATCAAGAAGTT 72 PIPCACCTTGTAGAGGGATGCTGCTA 75 PLAT GGAGGTTGGGCTTTAGCTGAA 78 PRSS16CGAACTCGGTACATGTCTGATACAA 81 PTGER3 GACGGCCATTCAGCTTATGG 84 PTPRTGAGCGGGAACCTTGGGATAG 87 RACGAP1 GAATGTGCGGAATCTGTTTGAG 90 RBBP8AAAACCAACTTCCCAAAAATTCTCT 93 SCUBE2 CCATCTCGAACTATGTCTTCAATGAGT 96SEC14L2 CGACCGGCACCTGAACTC 99 SQLE CCTTTAGCAGTTTTCTCCATAGTTTTATATC 102TFAP2B CCTCCTTGTCGCCAGTTTTACT 105 TOP2A CCAGTTGTGATGGATAAAATTAATCAG 108TRIM29 CAATCCCGTTGCCTTTGTTG 111 UBE2C GTTTCTTGCAGGTACTTCTTAAAAGCT 114WNT5A TTAGTGCTTTTTGCTTTCAAGATCTT 117 STC2 CCAGGACGCAGCTTTACCAA 120 AZGP1TAGGCCAGGCACTTCAGTTTC 123 CALM2 AGTCAGTTGGTCAGCCATGCT 126 CDH1TCAGCCGCTTTCAGATTTTCA 129 NMU AAAACCAACTTCCCAAAAATTCTCT 132 OAZ1AAGCCCAAAAAGCTGAAGGTT 135 PVALB CATCATCCGCACTCTTTTTCTTC 138 RPL37AGTGACAGCGGAAGTGGTATTGTAC 141

Table 2, below, lists the genes used in the methods of the invention andin the particular embodiments T5, T1, T4, and T5b. Table 2 also showswhether overexpression of a given gene is indicative of good or badoutcome under Tamoxifen therapy. Table 2 lists the function of the gene,the compartment localization within the cell and the cellular processesit is involved in.

TABLE 2 List of genes of algorithms T5, T1, T4, and T5b: High Gene NameExpression Function Component Process UBE2C ubiquitin- Bad ATP cytosolcell conjugating Outcome binding division enzyme E2C BIRC5 baculoviralBad Ran GTPase cytosol cell cycle IAP repeat- Outcome binding containing5 DHCR7 7- Bad 7- endoplasmatic regulation dehydrocholesterol Outcomedehydrocholesterol reticulum of cell reductase reductase membraneproliferation activity RACGAP1 Rac GTPase Bad GTPase cytoplasm cellcycle activating Outcome activator protein 1 activity AURKA aurora BadATP centrosome mitotic kinase A Outcome binding cell cycle PVALBparvalbumin Bad calcium Outcome ion binding NMU neuromedin U Badreceptor extracellular signal Outcome binding region transduction STC2stanniocalcin 2 Good hormone extracellular cell Outcome activity regionsurface receptor linked signal transduction AZGP1 alpha-2- Good proteinextracellular negative glycoprotein 1 Outcome transmembrane regionregulation transporter of cell activity proliferation RBBP8retinoblastoma Good protein nucleus cell cycle binding Outcome bindingcheckpoint protein 8 IL6ST interleukin Good receptor extracellularsignal 6 signal Outcome activity region transduction transducer MGPmatrix Gla Good extracellular extracellular cell protein Outcome matrixregion differentiation structural constituent PTGER3 prostaglandin EGood ligand- plasma signal receptor 3 Outcome dependent membranetransduction receptor activity CXCL12 chemokine Good chemokineextracellular signal (C-XC Outcome activity region transduction motif)ligand 12 ABAT 4- Good transferase mitochondrion gamma- aminobutyrateOutcome activity aminobutyric aminotransferase acid catabolic processCDH1 cadherin 1 Good cell plasma homophilic Outcome adhesion membranecell molecule adhesion binding PIP prolactin- Good actin extracellularinduced Outcome bindin region protein CALM2 Reference Gene OAZ1Reference Gene RPL37A Reference Gene

Table 3, below, shows the combinations of genes used for each algorithm.

TABLE 3 Combination of genes for the respective algorithms: Gene Algo_T1Algo_T4 Algo_T5 Algo_T5b UBE2C X BIRC5 X X X DHCR7 X X X RACGAP1 X XAURKA X PVALB X X NMU X X STC2 X X X AZGP1 X X RBBP8 X X X IL6ST X X XMGP X X PTGER3 X X CXCL12 X X ABAT X CDH1 X PIP X

Table 4, below, shows Affy probeset ID and TagMan design ID mapping ofthe marker genes of the present invention.

TABLE 4 Gene symbol, Affy probeset ID and TaqMan design ID mapping: GeneDesign ID Probeset ID UBE2C R65 202954_at BIRC5 SC089 202095_s_at DHCR7CAGMC334 201791_s_at RACGAP1 R125-2 222077_s_at AURKA CAGMC336204092_s_at PVALB CAGMC339 205336_at NMU CAGMC331 206023_at STC2 R52203438_at AZGP1 CAGMC372 209309_at RBBP8 CAGMC347 203344_s_at IL6STCAGMC312 212196_at MGP CAGMC383 202291_s_at PTGER3 CAGMC315 213933_atCXCL12 CAGMC342 209687_at ABAT CAGMC338 209460_at CDH1 CAGMC335201131_s_at

Table 5, below, shows full names, Entrez GeneID, gene bank accessionnumber and chromosomal location of the marker genes of the presentinvention

Official Entrez Accesion Symbol Official Full Name GeneID NumberLocation UBE2C ubiquitin- 11065 U73379 20q13.12 conjugating enzyme E2CBIRC5 baculoviral IAP 332 U75285 17q25 repeat-containing 5 DHCR7 7- 1717AF034544 11q13.4 dehydrocholesterol reductase STC2 staniocalcin 2 8614AB012664 5q35.2 RBBP8 retinoblastoma 5932 AF043431 18q11.2 bindingprotein 8 IL6ST interleukin 6 3572 M57230 5q11 signal transducer MGPmatrix Gla protein 4256 M58549 12p12.3 AZGP1 alpha-2- 563 BC00530611q22.1 glycoprotein 1, zinc-binding RACGAP1 Rac GTPase 29127 NM_01327712q13 activating protein 1 AURKA aurora kinase A 6790 BC001280 20q13PVALB parvalbumin 5816 NM_002854 22q13.1 NMU neuromedin U 10874 X760294q12 PTGER3 prostaglandin E 5733 X83863 1p31.2 receptor 3 (subtype EP3)CXCL12 chemokine 6387 L36033 10q11.1 (C—X—C motif) ligand 12 (stromalcell- derived factor 1) ABAT 4-aminobutyrat 18 L32961 16p13.2aminotransferase CDH1 cadherin 1, type 1, 999 L08599 16q22.1 E-cadherin(epithelial) PIP prolactin-induced 5304 NMM_002652 7q32-qter protein

EXAMPLE ALGORITHM T5

Algorithm T5 is a committee of four members where each member is alinear combination of two genes. The mathematical formulas for T5 areshown below; the notation is the same as for T1. T5 can be calculatedfrom gene expression data only.

riskMember 1 = 0.434039  [0.301  …  0.567] * (0.939 * BIRC 5  − 3.831) − 0.491845  [−0.714  …  − 0.270] * (0.707 * RBBP 8  − 0.934)riskMember 2 = 0.488785  [0.302  …  0.675] * (0.794 * UBE 2 C  − 1.416) − 0.374702  [−0.570  …  − 0.179] * (0.814 * IL 6ST  − 5.034)riskMember 3= −0.39169  [−0.541  …   − 0.242] * (0.674 * AZGP 1  − 0.777) + 0.44229  [0.256  …  0.628] * (0.891 * DHCR 7  − 4.378)riskMember 4= −0.377752  [−0.543  …   − 0.212]  *    (0.485 * MGP  + 4.330) − 0. 177669  [−0.267  …   − 0.088] * (0.826 * STC 2  − 3.630)risk = riskMember 1 + riskMember 2 + riskMember 3 + riskMember 4

Coefficients on the left of each line were calculated as COXproportional hazards regression coefficients, the numbers in squaredbrackets denote 95% confidence bounds for these coefficients. In otherwords, instead of multiplying the term (0.939*BIRC5−3.831) with0.434039, it may be multiplied with any coefficient between 0.301 and0.567 and still give a predictive result with in the 95% confidencebounds. Terms in round brackets on the right of each line denote aplatform transfer from PCR to Affymetrix: The variables PVALB, CDH1, . .. denote PCR-based expressions normalized by the reference genes(delta-Ct values), the whole term within round brackets corresponds tothe logarithm (base 2) of Affymetrix microarray expression values ofcorresponding probe sets.

Performance of the algorithm T5 was tested in Tamoxifen or Anastrozoletreated patients with no more than 3 positive lymph nodes and ER+, HER2−tumors, who participated in the randomized clinical trials ABCSG06(n=332) or ABCSG08 (n=1244). As shown in FIG. 1, Cox regression analysisreveals, that the T5 score has a significant association with thedevelopment of distant metastasis in all cohorts tested.

Kaplan Meier analysis was performed, after classifying the patients ofthe combined ABCSG cohorts using a predefined cut off for T5 score.Patients with a low risk of development of a distant metastasis had a T5score≦−9.3, while patients with a high risk of development of a distantmetastasis had a T5 score above −9.3. As shown in FIG. 2, a highlysignificant separation of both risk groups is observed.

Importantly, the T5 score was evaluated and compared against “Adjuvant!Online”, an online tool to aid in therapy selection based on entry ofclinical parameters such as tumor size, tumor grade and nodal status.When the T5 score was tested by bivariate Cox regression against theAdjuvant!Online Relapse Risk score, both scores remained a significantassociation with the development of distant metastasis. Bivariate Coxregression using dichotomized data, which were stratified according toT5 (cut off −9.3) respectively to Adjuvant!Online (cut off 8), againyielded highly significant and independent associations with time tometastasis as clinical endpoint.

TABLE 6 Bivariate Cox regression von T5 und Adjuvant!Online HazardVariable ratio 95% CI* P Adjuvant!Online 2.36 1.58-3.54 <0.0001Gene-expression 2.62 1.71-4.01 <0.0001 signature (risk group)Adjuvant!Online (score) 1.04 1.02-1.06 <0.0001 Gene-expression 1.351.21-1.49 <0.0001 signature (risk group) with HR = Hazard Ratio, 95% CI= 95% Confidence Interval, p = P value.

Exemplary Kaplan Meyer Curves are shown in FIG. 1 wherein High=High RiskGroup, Low=Low Risk Group according to a predefined cut off

A high value of the T5 score indicates an increased risk of occurrenceof distant metastasis in a given time period.

This has been shown to be the case for patients having been treated withtamoxifen and also for patients having been treated with aromataseinhibitors.

EXAMPLE ALGORITHM T1

Algorithm T1 is a committee of three members where each member is alinear combination of up to four variables. In general variables may begene expressions or clinical variables. In T1 the only non-gene variableis the nodal status coded 0, if patient is lymph-node negative and 1, ifpatient is lymph-node-positive. The mathematical formulas for T1 areshown below.

riskMember 1= +0.193935  [0.108  …  0.280] * (0.792 * PVALB  − 2.189) − 0.240252  [−0.400  …   − 0.080] * (0.859 * CDH 1  − 2.900) − 0.270069  [−0.385  …   − 0.155] * (0.821 * STC 2  − 3.529) + 1.2053  [0.534  …  1.877] * nodalStatusriskMember 2= −0.25051  [−0.437  …   − 0.064] * (0.558 * CXCL 12  + 0.324) − 0.421992  [−0.687  …   − 0.157] * (0.715 * RBBP 8  − 1.063) + 0.148497  [0.029  …  0.268] * (1.823 * NMU  − 12.563) + 0.293563  [0.108  …  0.479] * (0.989 * BIRC 5  − 4.536)riskMember 3= +0.308391  [0.074  …  0.543] * (0.812 * AURKA  − 2.656) − 0.225358  [−0.395  …   − 0.055] * (0.637 * PTGER 3 + 0.492) − 0.116312  [−0.202  …   − 0.031] * (0.724 * PIP + 0.985)risk = +  riskMember 1+ riskMember 2+ riskMember 3

Coefficients on the left of each line were calculated as COXproportional hazards regression coefficients, the numbers in squaredbrackets denote 95% confidence bounds for these coefficients. Terms inround brackets on the right of each line denote a platform transfer fromPCR to Affymetrix: The variables PVALB, CDH1, . . . denote PCR-basedexpressions normalized by the reference genes, the whole term withinround brackets corresponds to the logarithm (base 2) of Affymetrixmicroarray expression values of corresponding probe sets.

EXAMPLE ALGORITHM T4

Algorithm T4 is a linear combination of motifs. The top 10 genes ofseveral analyses of Affymetrix datasets and PCR data were clustered tomotifs. Genes not belonging to a cluster were used as singlegene-motifs. COX proportional hazards regression coefficients were foundin a multivariate analysis.

In general motifs may be single gene expressions or mean geneexpressions of correlated genes. The mathematical formulas for T4 areshown below.

prolif=((0.84 [0.697 . . . 0.977]*RACGAP1−2.174)+(0.85 [0.713 . . .0.988]*DHCR7−3.808)+(0.94 [0.786 . . . 1.089]*BIRC5−3.734))/3

motiv2=((0.83 [0.693 . . . 0.96]*IL6ST−5.295)+(1.11 [0.930 . . .1.288]*ABAT−7.019)+(0.84 [0.701 . . . 0.972]*STC2−3.857))/3

ptger3=(PTGER3*0.57 [0.475 . . . 0.659]+1.436)

cxcl12=(CXCL12*0.53 [0.446 . . . 0.618]−0.847)

pvalb=(PVALB*0.67 [0.558 . . . 0.774]−0.466)

Factors and offsets for each gene denote a platform transfer from PCR toAffymetrix: The variables RACGAP1, DHCR7, . . . denote PCR-basedexpressions normalized by CALM2 and PPIA, the whole term within roundbrackets corresponds to the logarithm (base 2) of Affymetrix microarrayexpression values of corresponding probe sets.

The numbers in squared brackets denote 95% confidence bounds for thesefactors.

As the algorithm performed even better in combination with a clinicalvariable the nodal status was added. In T4 the nodal status is coded 0,if patient is lymph-node negative and 1, if patient islymph-node-positive. With this, algorithm T4 is:

risk= −0.32  [−0.510  …   − 0.137] * motiv 2 + 0.65  [0.411  …  0.886] * prolif  − 0.24  [−0.398  …   − 0.08] * ptger 3  − 0.05  [−0.225  …  0.131] * cxc l 12  + 0.09  [0.019  …  0.154] * pvalb  + nodalStatus

Coefficients of the risk were calculated as COX proportional hazardsregression coefficients, the numbers in squared brackets denote 95%confidence bounds for these coefficients.

Algorithm T5b is a committee of two members where each member is alinear combination of four genes. The mathematical formulas for T5b areshown below, the notation is the same as for T1 and T5. In T5b anon-gene variable is the nodal status coded 0, if patient is lymph-nodenegative and 1, if patient is lymph-node-positive and 0.5 if thelymph-node status is unknown. T5b is defined by:

riskMember 1 = 0.359536  [0.153  …  0.566] * (0.891 * DHCR 7  − 4.378) − 0.288119  [−0.463  …   − 0.113] * (0.485 * MGP+ 4.330) + 0.257341  [0.112  …  0.403] * (1.118 * NMU  − 5.128) − 0.337663  [−0.499  …   − 0.176] * (0.674 * AZGP 1  − 0.777)riskMember 2= −0.374940  [−0.611  …   − 0.139] * (0.717 * RBBP 8  − 0.934) − 0.387371  [−0.597  …   − 0.178] * (0.814 * IL 6 ST  − 5.034) + 0.800745  [0.551  …  1.051] * (0.860 * RACGAP 1  − 2.518) + 0.770650  [0.323  …  1.219] * Nodalstatusrisk = riskMember 1 + riskMember 2

The skilled person understands that these algorithms representparticular examples and that based on the information regardingassociation of gene expression with outcome as given in table 2alternative algorithms can be established using routine skills.

Algorithm Simplification by Employing Subsets of Genes

“Example algorithm T5” is a committee predictor consisting of 4 memberswith 2 genes of interest each. Each member is an independent andself-contained predictor of distant recurrence, each additional membercontributes to robustness and predictive power of the algorithm topredict time to metastasis, time to death or likelihood of survival fora breast cancer patient. The equation below shows the “Example AlgorithmT5”; for ease of reading the number of digits after the decimal pointhas been truncated to 2; the range in square brackets lists theestimated range of the coefficients (mean+/−3 standard deviations).

T 5  Algorithm: + 0.41  [0.21  …  0.61] * BIRC 5  − 0.33  [−0.57  …   − 0.09] * RBBP 8 + 0.38  [0.15  …  0.61] * UBE 2 C  − 0.30  [−0.55  …   − 0.06] * IL 6 ST − 0.28  [−0.43  …   − 0.12] * AZGP 1  + 0.42  [0.16  …  0.68] * DHCR 7 − 0.18  [−0.31  …   − 0.06] * MGP  − 0.13  [−0.25  …   − 0.02] * STC 2c-indices:  trainSet = 0.724,  

Gene names in the algorithm denote the difference of the mRNA expressionof the gene compared to one or more housekeeping genes as describedabove.

Analysing a cohort different from the finding cohort (234 tumor samples)it was surprising to learn that some simplifications of the “original T5Algorithm” still yielded a diagnostic performance not significantlyinferior to the original T5 algorithm. The most straightforwardsimplification was reducing the committee predictor to one member only.Examples for the performance of the “one-member committees” are shownbelow:

member  1  only: + 0.41[0.21..0.61] * BIRC 5   − 0.33[−0.57.. − 0.09] * RBBP 8c-indices:  trainSet = 0.653, independentCohort = 0.681member  2  only: + 0.38[0.15..0.61] * UBE 2 C   − 0.30[−0.55.. − 0.06] * IL 6STc-indices:  trainSet = 0.664, independentCohort = 0.696member  3  only: − 0.28[−0.43.. − 0.12] * AZGP 1  + 0.42[0.16..0.68] * DHCR 7c-indices:  trainSet = 0.666, independentCohort = 0.601member  4  only: − 0.18[−0.31.. − 0.06] * MGP   − 0.13[−0.25.. − 0.02] * STC 2c-indices:  trainSet = 0.668, independentCohort = 0.681

The performance of the one member committees as shown in an independentcohort of 234 samples is notably reduced compared to the performance ofthe full algorithm. Still, using a committee consisting of fewer membersallows for a simpler, less costly estimate of the risk of breast cancerrecurrence or breast cancer death that might be acceptable for certaindiagnostic purposes.

Gradually combining more than one but less than four members to a newprognostic committee predictor algorithm, frequently leads to a smallbut significant increase in the diagnostic performance compared to aone-member committee. It was surprising to learn that there were markedimprovements by some combination of committee members while othercombinations yielded next to no improvement. Initially, the hypothesiswas that a combination of members representing similar biologicalmotives as reflected by the employed genes yielded a smaller improvementthan combining members reflecting distinctly different biologicalmotives. Still, this was not the case. No rule could be identified toforetell the combination of some genes to generate an algorithmexibiting more prognostic power than another combination of genes.Promising combinations could only be selected based on experimentaldata.

Identified combinations of combined committee members to yieldsimplified yet powerful algorithms are shown below.

members  1  and  2  only: + 0.41  [0.21  …  0.61] * BIRC 5  − 0.33  [−0.57  …   − 0.09] * RBBP 8 + 0.38  [0.15  …  0.61] * UBE 2 C  − 0.30  [−0.55  …   − 0.06] * IL 6 STc-indices:  trainSet = 0.675, independentCohort = 0.712members  1  and  3  only: + 0.41  [0.21  …  0.61] * BIRC 5  − 0.33  [−0.57  …   − 0.09] * RBBP 8 − 0.28  [−0.43  …   − 0.12] * AZGP 1  + 0.42  [0.16  …  0.68] * DHCR 7c-indices:  trainSet = 0.697, independentCohort = 0.688members  1  and  4  only: + 0.41  [0.21  …  0.61] * BIRC 5  − 0.33  [−0.57  …   − 0.09] * RBBP 8 − 0.18  [−0.31  …   − 0.06] * MGP  − 0.13  [−0.25  …   − 0.02] * STC 2c-indices:  trainSet = 0.705, independentCohort = 0.679members  2  and  3  only: + 0.38  [0.15  …  0.61] * UBE 2 C  − 0.30  [−0.55  …   − 0.06] * IL 6 ST − 0.28  [−0.43  …   − 0.12] * AZGP 1  + 0.42  [0.16  …  0.68] * DHCR 7c-indices:  trainSet = 0.698, independentCohort = 0.670members  1, 2  and  3  only: + 0.41  [0.21  …  0.61] * BIRC 5  − 0.33  [−0.57  …   − 0.09] * RBBP 8 + 0.38  [0.15  …  0.61] * UBE 2 C  − 0.30  [−0.55  …   − 0.06] * IL 6 ST − 0.28  [−0.43  …   − 0.12] * AZGP 1  + 0.42  [0.16  …  0.68] * DHCR 7c-indices:  trainSet = 0.701, independentCohort = 0.715

Not omitting complete committee members but a single gene or genes fromdifferent committee members is also possible but requires a retrainingof the entire algorithm. Still, it can also be advantageous to perform.The performance of simplified algorithms generated by omitting entiremembers or individual genes is largely identical.

Algorithm Variants by Gene Replacement

Described algorithms, such as “Example algorithm T5”, above can be alsobe modified by replacing one or more genes by one or more other genes.The purpose of such modifications is to replace genes difficult tomeasure on a specific platform by a gene more straightforward to assayon this platform. While such transfer may not necessarily yield animproved performance compared to a starting algorithm, it can yield theclue to implanting the prognostic algorithm to a particular diagnosticplatform. In general, replacing one gene by another gene whilepreserving the diagnostic power of the predictive algorithm can be bestaccomplished by replacing one gene by a co-expressed gene with a highcorrelation (shown e.g. by the Pearson correlation coefficient). Still,one has to keep in mind that the mRNA expression of two genes highlycorrelative on one platform may appear quite independent from each otherwhen assessed on another platform. Accordingly, such an apparently easyreplacement when reduced to practice experimentally, may yielddisappointingly poor results as well as surprising strong results,always depending on the imponderabilia of the platform employed. Byrepeating this procedure one can replace several genes.

The efficiency of such an approach can be demonstrated by evaluating thepredictive performance of the T5 algorithm score and its variants on thevalidation cohorts. The following table shows the c-index with respectto endpoint distant recurrence in two validation cohorts.

TABLE 7 Variant Validation Study A Validation Study B original algorithmT5 c-index = 0.718 c-index = 0.686 omission of BIRC5 (setting c-index =0.672 c-index = 0.643 expression to some constant) replacing BIRC5 byUBE2C c-index = 0.707 c-index = 0.678 (no adjustment of the coefficient)

One can see that omission of one of the T5 genes, here shown for BIRC5for example, notably reduces the predictive performance. Replacing itwith another gene yields about the same performance.

A better method of replacing a gene is to re-train the algorithm. SinceT5 consists of four independent committee members one has to re-trainonly the member that contains the replaced gene. The following equationsdemonstrate replacements of genes of the T5 algorithm shown abovetrained in a cohort of 234 breast cancer patients. Only one member isshown below, for c-index calculation the remaining members were usedunchanged from the original T5 Algorithm. The range in square bracketslists the estimated range of the coefficients: mean+/−3 standarddeviations.

Member  1  of  T 5:Original  member  1: + 0.41  [0.21  …  0.61] * BIRC 5  − 0.33  [−0.57  …   − 0.09] * RBBP 8c-indices:  trainSet = 0.724, independentCohort = 0.705replace  BIRC 5  by  TOP 2 A  in  member  1: + 0.47  [0.24  …  0.69] * TOP 2 A  − 0.34  [−0.58  …   − 0.10] * RBBP 8c-indices:  trainSet = 0.734, independentCohort = 0.694replace  BIRC 5  by  RACGAP 1  in  member  1: + 0.69  [0.37  …  1.00] * RACGAP 1  − 0.33  [−0.57  …   − 0.09] * RBBP 8c-indices:  trainSet = 0.736, independentCohort = 0.743replace  RBBP 8  by  CELSR 2  in  member  1: + 0.38  [0.19  …  0.57] * BIRC 5  − 0.18  [−0.41  …  0.05] * CELSR 2c-indices:  trainSet = 0.726, independentCohort = 0.680replace  RBBP 8  by  PGR  in  member  1: + 0.35  [0.15  …  0.54] * BIRC 5  − 0.09  [−0.23  …  0.05] * PGRc-indices:  trainSet = 0.727, independentCohort = 0.731Member  2  of  T 5:Original  member  2: + 0.38  [0.15  …  0.61] * UBE 2 C  − 0.30  [−0.55  …   − 0.06] * IL 6 STc-indices:  trainSet = 0.724, independentCohort = 0.725replace  UBE 2 C  by  RACGAP 1  in  member  2: + 0.65  [0.33  …  0.96] * RACGAP 1  − 0.38  [−0.62  …   − 0.13] * IL 6 STc-indices:  trainSet = 0.735, independentCohort = 0.718replace  UBE 2 C  by  TOP 2 A  in  member  2: + 0.42  [0.20  …  0.65] * TOP 2 A  − 0.38  [−0.62  …   − 0.13] * IL 6 STc-indices:  trainSet = 0.734, independentCohort = 0.700replace  IL 6 ST  by  INPP 4 B  in  member  2: + 0.40  [0.17  …  0.62] * UBE 2 C  − 0.25  [−0.55  …  0.05] * INPP 4 Bc-indices:  trainSet = 0.725, independentCohort = 0.686replace  IL 6 ST  by  MAPT  in  member  2: + 0.45  [0.22  …  0.69] * UBE 2 C  − 0.14  [−0.28  …  0.01] * MAPTc-indices:  trainSet = 0.727, independentCohort = 0.711Member  3  of  T 5:Original  member  3: − 0.28  [−0.43  …   − 0.12] * AZGP 1  + 0.42  [0.16  …  0.68] * DHCR 7c-indices:  trainSet = 0.724, independentCohort = 0.705replace  AZGP 1  by  PIP  in  member  3: − 0.10  [−0.18  …   − 0.02] * PIP  + 0.43  [0.16  …  0.70] * DHCR 7c-indices:  trainSet = 0.725, independentCohort = 0.692replace  AZGP 1  by  EPHX 2  in  member  3: − 0.23  [−0.43  …   − 0.02] * EPHX 2  + 0.37  [0.10  …  0.64] * DHCR 7c-indices:  trainSet = 0.719, independentCohort = 0.698replace  AZGP 1  by  PLAT  in  member  3: − 0.23  [−0.40  …   − 0.06] * PLAT  + 0.43  [0.18  …  0.68] * DHCR 7c-indices:  trainSet = 0.712, independentCohort = 0.715replace  DHCR 7  by  AURKA  in  member  3: − 0.23  [−0.39  …   − 0.06] * AZGP 1  + 0.34  [0.10  …  0.58] * AURKAc-indices:  trainSet = 0.716, independentCohort = 0.733Member  4  of  T 5:Original  member  4: − 0.18  [−0.31  …   − 0.06] * MGP  − 0.13  [−0.25  …   − 0.02] * STC 2c-indices:  trainSet = 0.724, independentCohort = 0.705replace  MGP  by  APOD  in  member  4: − 0.16  [−0.30  …   − 0.03] * APOD  − 0.14  [−0.26  …   − 0.03] * STC 2c-indices:  trainSet = 0.717, independentCohort = 0.679replace  MGP  by  EGFR  in  member  4: − 0.21  [−0.37  …   − 0.05] * EGFR  − 0.14  [−0.26  …   − 0.03] * STC 2c-indices:  trainSet = 0.715, independentCohort = 0.708replace  STC 2  by  INPP 4 B  in  member  4: − 0.18  [−0.30  …   − 0.05] * MGP  − 0.22  [−0.53  …  0.08] * INPP 4 Bc-indices:  trainSet = 0.719, independentCohort = 0.693replace  STC 2  by  SEC 14 L 2  in  member  4: − 0.18  [−0.31  …   − 0.06] * MGP  − 0.27  [−0.49  …   − 0.06] * SEC 14 L 2c-indices:  trainSet = 0.718, independentCohort = 0.681

One can see that replacements of single genes experimentally identifiedfor a quantification with kinetic PCR normally affect the predictiveperformance of the T5 algorithm, assessed by the c-index onlyinsignificantly.

The following table (Tab. 8) shows potential replacement gene candidatesfor the genes of T5 algorithm. Each gene candidate is shown in one tablecell: The gene name is followed by the bracketed absolute Pearsoncorrelation coefficient of the expression of the original gene in the T5Algorithm and the replacement candidate, and the HG-U133A probe set ID.

TABLE 8 BIRC5 RBBP8 UBE2C IL6ST AZGP1 DHCR7 MGP STC2 UBE2C (0.775),CELSR2 BIRC5 (0.775), INPP4B PIP (0.530), AURKA (0.345), APOD (0.368),INPP4B 202954_at (0.548), 202095_s_at (0.477), 206509_at 204092_s_at201525_at (0.500), 204029_at 205376_at 205376_at TOP2A (0.757), PGR(0.392), RACGAP1 STC2 (0.450), EPHX2 (0.369), BIRC5 (0.323), IL6ST(0.327), IL6ST (0.450), 201292_at 208305_at (0.756), 203438_at 209368_at202095_s_at 212196_at 212196_at 222077_s_at RACGAP1 STC2 (0.361), TOP2A(0.753), MAPT (0.440), PLAT (0.366), UBE2C (0.315), EGFR (0.308),SEC14L2 (0.704), 203438_at 201292_at 206401_s_at 201860_s_at 202954_at201983_s_at (0.417), 222077_s_at 204541_at AURKA (0.681), ABAT (0.317),AURKA (0.694), SCUBE2 SEC14L2 MAPT (0.414), 204092_s_at 209459_s_at204092_s_at (0.418), (0.351), 206401_s_at 219197_s_at 204541_at NEK2(0.680), IL6ST (0.311), NEK2 (0.684), ABAT (0.389), SCUBE2 CHPT1(0.410), 204026_s_at 212196_at 204026_s_at 209459_s_at (0.331),221675_s_at E2F8 (0.640), E2F8 (0.652), PGR (0.377), 219197_s_at ABAT(0.409), 219990_at 219990_at 208305_at PGR (0.302), 209459_s_at208305_at PCNA (0.544), PCNA (0.589), SEC14L2 SCUBE2 201202_at 201202_at(0.356), (0.406), CYBRD1 CYBRD1 204541_at 219197_s_at (0.462), (0.486),ESR1 (0.353), ESR1 (0.394), 217889_s_at 217889_s_at 205225_at 205225_atDCN (0.439), ADRA2A GJA1 (0.335), RBBP8 (0.361), 209335_at (0.391),201667_at 203344_s_at 209869_at ADRA2A DCN (0.384), MGP (0.327), PGR(0.347), (0.416), 209335_at 202291_s_at 208305_at 209869_at SQLE(0.415), SQLE (0.369), EPHX2 (0.313), PTPRT 209218_at 209218_at209368_at (0.343), 205948_at CXCL12 CCND1 (0.347), RBBP8 (0.311), HSPA2(0.317), (0.388), 208712_at 203344_s_at 211538_s_at 209687_at EPHX2(0.362), ASPH (0.344), PTPRT (0.303), PTGER3 209368_at 210896_s_at205948_at (0.314), 210832_x_at ASPH (0.352), CXCL12 PLAT (0.301),210896_s_at (0.342), 201860_s_at 209687_at PRSS16 PIP (0.328), (0.352),206509_at 208165_s_at EGFR (0.346), PRSS16 201983_s_at (0.326),208165_s_at CCND1 (0.331), EGFR (0.320), 208712_at 201983_s_at TRIM29DHCR7 (0.315), (0.325), 201791_s_at 202504_at DHCR7 (0.323), EPHX2(0.315), 201791_s_at 209368_at PIP (0.308), TRIM29 206509_at (0.311),TFAP2B 202504_at (0.306), 214451_at WNT5A (0.303), 205990_s_at APOD(0.301), 201525_at PTPRT (0.301), 205948_at

The following table (Tab. 9) lists qRT-PCR primer and probe sequencesused for the table above.

TABLE 9 gene probe forward primer reverse primer ABATTCGCCCTAAGAGGCTCTTCCTC GGCAACTTGAGGTCTGACTTTTG GGTCAGCTCACAAGTGGTGTGAADRA2A TTGTCCTTTCCCCCCTCCGTGC CCCCAAGAGCTGTTAGGTATCAATCAATGACATGATCTCAACCAGAA APOD CATCAGCTCTCAACTCCTGGTTTAACAACTCACTAATGGAAAACGGAAAGATC TCACCTTCGATTTGATTCACAGTT ASPHTGGGAGGAAGGCAAGGTGCTCATC TGTGCCAACGAGACCAAGAC TCGTGCTCAAAGGAGTCATCAAURKA CCGTCAGCCTGTGCTAGGCAT AATCTGGAGGCAAGGTTCGA TCTGGATTTGCCTCCTGTGAABIRC5 AGCCAGATGACGACCCCATAGAGGAACA CCCAGTGTTTCTTCTGCTTCAAGCAACCGGACGAATGCTTTTT CCND1 CELSR2 ACTGACTTTCCTTCTGGAGCAGGTGGCTCCAAGCATGTATTCCAGACTTGT TGCCCACAGCCTCTTTTTCT CHPT1CCACGGCCACCGAAGAGGCAC CGCTCGTGCTCATCTCCTACT CCCAGTGCACATAAAAGGTATGTCCXCL12 CCACAGCAGGGTTTCAGGTTCC GCCACTACCCCCTCCTGAA TCACCTTGCCAACAGTTCTGATCYBRD1 AGGGCATCGCCATCATCGTC GTCACCGGCTTCGTCTTCA CAGGTCCACGGCAGTCTGT DCNTCTTTTCAGCAACCCGGTCCA AAGGCTTCTTATTCGGGTGTGA TGGATGGCTGTATCTCCCAGTADHCR7 TGAGCGCCCACCCTCTCGA GGGCTCTGCTTCCCGATT AGTCATAGGGCAAGCAGAAAATTCE2F8 CAGGATACCTAATCCCTCTCACGCAG AAATGTCTCCGCAACCTTGTTC CTGCCCCCAGGGATGAGEGFR EPHX2 TGAAGCGGGAGGACTTTTTGTAAA CGATGAGAGTGTTTTATCCATGCAGCTGAGGCTGGGCTCTTCT ESR1 ATGCCCTTTTGCCGATGCA GCCAAATTGTGTTTGATGGATTAAGACAAAACCGAGTCACATCAGTAATAG GJA1 TGCACAGCCTTTTGATTTCCCCGATCGGGAAGCACCATCTCTAACTC TTCATGTCCAGCAGCTAGTTTTTT HSPA2CAAGTCAGCAAACACGCAAAA CATGCACGAACTAATCAAAAATGCACATTATTCGAGGTTTCTCTTTAATGC IL6ST CAAGCTCCACCTTCCAAAGGACCTCCCTGAATCCATAAAGGCATACC CAGCTTCGTTTTTCCCTACTTTTT INPP4BTCCGAGCGCTGGATTGCATGAG GCACCAGTTACACAAGGACTTCTTT TCTCTATGCGGCATCCTTCTCMAPT AGACTATTTGCACACTGCCGCCT GTGGCTCAAAGGATAATATCAAACACACCTTGCTCAGGTCAACTGGTT MGP CCTTCATATCCCCTCAGCAGAGATGGCCTTCATTAACAGGAGAAATGCAA ATTGAGCTCGTGGACAGGCTTA NEK2TCCTGAACAAATGAATCGCATGTCCTACAA ATTTGTTGGCACACCTTATTACATGTAAGCAGCCCAATGACCAGATa PCNA AAATACTAAAATGCGCCGGCAATGAGGGCGTGAACCTCACCAGTA CTTCGGCCCTTAGTGTAATGATATC PGRTTGATAGAAACGCTGTGAGCTCGA AGCTCATCAAGGCAATTGGTTTACAAGATCATGCAAGTTATCAAGAAGTT PIP TGCATGGTGGTTAAAACTTACCTCATGCTTGCAGTTCAAACAGAATTG CACCTTGTAGAGGGATGCTGCTA PLATCAGAAAGTGGCCATGCCACCCTG TGGGAAGACATGAATGCACACTA GGAGGTTGGGCTTTAGCTGAAPRSS16 CACTGCCGGTCACCCACACCA CTGAGGAGCACAGAACCTCAACTCGAACTCGGTACATGTCTGATACAA PTGER3 TCGGTCTGCTGGTCTCCGCTCCCTGATTGAAGATCATTTTCAACATCA GACGGCCATTCAGCTTATGG PTPRTTTGGCTTCTGGACACCCTCACA GAGTTGTGGCCTCTACCATTGC GAGCGGGAACCTTGGGATAGRACGAP1 ACTGAGAATCTCCACCCGGCGCA TCGCCAACTGGATAAATTGGAGAATGTGCGGAATCTGTTTGAG RBBP8 ACCGATTCCGCTACATTCCACCCAACAGAAATTGGCTTCCTGCTCAAG AAAACCAACTTCCCAAAAATTCTCT SCUBE2CTAGAGGGTTCCAGGTCCCATACGTGACATA TGTGGATTCAGTTCAAGTCCAATGCCATCTCGAACTATGTCTTCAATGAGT SEC14L2 TGGGAGGCATGCAACGCGTGAGGTCTTACTAAGCAGTCCCATCTCT CGACCGGCACCTGAACTC SQLETATGCGTCTCCCAAAAGAAGAACACCTCG GCAAGCTTCCTTCCTCCTTCACCTTTAGCAGTTTTCTCCATAGTTTTATATC STC2 TCTCACCTTGACCCTCAGCCAAGACATTTGACAAATTTCCCTTAGGATT CCAGGACGCAGCTTTACCAA TFAP2BCAACACCACCACTAACAGGCACACGTC GGCATGGACAAGATGTTCTTGACCTCCTTGTCGCCAGTTTTACT TOP2A CAGATCAGGACCAAGATGGTTCCCACATCATTGAAGACGCTTCGTTATGG CCAGTTGTGATGGATAAAATTAATCAG TRIM29TGCTGTCTCACTACCGGCCATTCTACG TGGAAATCTGGCAAGCAGACT CAATCCCGTTGCCTTTGTTGUBE2C TGAACACACATGCTGCCGAGCTCTG CTTCTAGGAGAACCCAACATTGATAGTGTTTCTTGCAGGTACTTCTTAAAAGCT WNT5A TATTCACATCCCCTCAGTTGCAGTGAATTGCTGTGGCTCTTAATTTATTGCATAATG TTAGTGCTTTTTGCTTTCAAGATCTT

A second alternative for unsupervised selection of possible genereplacement candidates is based on Affymetrix data only. This has theadvantage that it can be done solely based on already published data(e.g. from www.ncbi.nlm.nih.gov/geo/). The following table (Tab. 10)lists HG-U133a probe set replacement candidates for the probe sets usedin algorithms T1-T5. This is based on training data of these algorithms.The column header contains the gene name and the probe set ID in bold.Then, the 10 best-correlated probe sets are listed, where each tablecell contains the probe set ID, the correlation coefficient in bracketsand the gene name.

TABLE 10 UBE2C BIRC5 DHCR7 RACGAP1 AURKA PVALB NMU STC2 202954_at202095_s_at 201791_s_at 222077_s_at 204092_s_at 205336_at 206023_at203438_at 210052_s_at 202954_at 201790_s_at 218039_at 208079_s_at208683_at 205347_s_at 203439_s_at (0.82) TPX2 (0.82) UBE2C (0.66) DHCR7(0.79) NUSAP1 (0.89) STK6 (−0.33) CAPN2 (0.45) TMSL8 (0.88) STC2202095_s_at 218039_at 202218_s_at 214710_s_at 202954_at 219682_s_at203764_at 212496_s_at (0.82) BIRC5 (0.81) NUSAP1 (0.48) FADS2 (0.78)CCNB1 (0.80) UBE2C (0.30) TBX3 (0.45) DLG7 (0.52) JMJD2B 218009_s_at218009_s_at 202580_x_at 203764_at 210052_s_at 218704_at 203554_x_at219440_at (0.82) PRC1 (0.79) PRC1 (0.47) FOXM1 (0.77) DLG7 (0.77) TPX2(0.30) FLJ20315 (0.44) PTTG1 (0.52) RAI2 203554_x_at 202705_at 208944_at204026_s_at 202095_s_at 204962_s_at 215867_x_at (0.82) PTTG1 (0.78)CCNB2 (−0.46) (0.77) ZWINT (0.77) BIRC5 (0.44) CENPA (0.51) CA12 TGFBR2208079_s_at 204962_s_at 202954_at 218009_s_at 203554_x_at 204825_at214164_x_at (0.81) STK6 (0.78) CENPA (0.46) UBE2C (0.76) PRC1 (0.76)PTTG1 (0.43) MELK (0.50) CA12 202705_at 203554_x_at 209541_at 204641_at218009_s_at 209714_s_at 204541_at (0.81) CCNB2 (0.78) PTTG1 (−0.45) IGF1(0.76) NEK2 (0.75) PRC1 (0.41) CDKN3 (0.50) SEC14L2 218039_at208079_s_at 201059_at 204444_at 201292_at 219918_s_at 203963_at (0.81)NUSAP1 (0.78) STK6 (0.45) CTTN (0.75) KIF11 (0.73) TOP2A (0.41) ASPM(0.50) CA12 202870_s_at 210052_s_at 200795_at 202705_at 214710_s_at207828_s_at 212495_at (0.80) CDC20 (0.77) TPX2 (−0.45) (0.75) CCNB2(0.73) CCNB1 (0.41) CENPF (0.50) JMJD2B SPARCL1 204092_s_at 202580_x_at218009_s_at 203362_s_at 204962_s_at 202705_at 208614_s_at (0.80) STK6(0.77) FOXM1 (0.45) PRC1 (0.75) MAD2L1 (0.73) CENPA (0.41) CCNB2 (0.49)FLNB 209408_at 204092_s_at 218542_at 202954_at 218039_at 219787_s_at213933_at (0.80) KIF2C (0.77) STK6 (0.45) C10orf3 (0.75) UBE2C (0.73)NUSAP1 (0.40) ECT2 (0.49) PTGER3 AZGP1 RBBP8 IL6ST MGP PTGER3 CXCL12ABAT CDH1 209309_at 203344_s_at 212196_at 202291_s_at 213933_at209687_at 209460_at 201131_s_at 217014_s_at 36499_at 212195_at 201288_at210375_at 204955_at 209459_s_at 201130_s_at (0.92) AZGP1 (0.49) CELSR2(0.85) IL6ST (0.46) (0.74) PTGER3 (0.81) SRPX (0.92) ABAT (0.57) CDH1ARHGDIB 206509_at 204029_at 204864_s_at 219768_at 210831_s_at 209335_at206527_at 221597_s_at (0.52) PIP (0.45) CELSR2 (0.75) IL6ST (0.42) VTCN1(0.74) PTGER3 (0.81) DCN (0.63) ABAT (0.40) HSPC171 204541_at 208305_at211000_s_at 202849_x_at 210374_x_at 211896_s_at 213392_at 203350_at(0.46) SEC14L2 (0.45) PGR (0.68) IL6ST (−0.41) GRK6 (0.73) PTGER3 (0.81)DCN (0.54) (0.38) AP1G1 MGC35048 200670_at 205380_at 214077_x_at205382_s_at 210832_x_at 201893_x_at 221666_s_at 209163_at (0.45) XBP1(0.43) PDZK1 (0.61) MEIS4 (0.40) DF (0.73) PTGER3 (0.81) DCN (0.49)PYCARD (0.36) CYB561 209368_at 203303_at 204863_s_at 200099_s_at210834_s_at 203666_at 218016_s_at 210239_at (0.45) EPHX2 (0.41) TCTE1L(0.58) IL6ST (0.39) RPS3A (0.55) PTGER3 (0.80) CXCL12 (0.48) POLR3E(0.35) IRX5 218627_at 205280_at 202089_s_at 221591_s_at 210833_at211813_x_at 214440_at 200942_s_at (−0.43) (0.38) GLRB (0.57) (−0.37)FAM64A (0.55) PTGER3 (0.80) DCN (0.46) NAT1 (0.34) HSBP1 FLJ11259SLC39A6 202286_s_at 205279_s_at 210735_s_at 214629_x_at 203438_at208747_s_at 204981_at 209157_at (0.43) (0.38) GLRB (0.56) CA12 (0.37)RTN4 (0.49) STC2 (0.79) C1S (0.45) SLC22A18 (0.34) DNAJA2 TACSTD2213832_at 203685_at 200648_s_at 200748_s_at 203439_s_at 203131_at212195_at 210715_s_at (0.42) — (0.38) BCL2 (0.52) GLUL (0.37) FTH1(0.46) STC2 (0.78) PDGFRA (0.45) IL6ST (0.33) SPINT2 204288_s_at203304_at 214552_s_at 209408_at 212195_at 202994_s_at 204497_at203219_s_at (0.41) SORBS2 (−0.38) BAMBI (0.52) RABEP1 (−0.37) KIF2C(0.41) IL6ST (0.78) FBLN1 (0.45) ADCY9 (0.33) APRT 202376_at 205862_at219197_s_at 218726_at 217764_s_at 208944_at 215867_x_at 218074_at (0.41)SERPINA3 (0.36) GREB1 (0.51) SCUBE2 (−0.36) (0.40) RAB31 (0.78) TGFBR2(0.45) CA12 (0.33) FAM96B DKFZp762E1312

After selection of a gene or a probe set one has to define amathematical mapping between the expression values of the gene toreplace and those of the new gene. There are several alternatives whichare discussed here based on the example “replace delta-Ct values ofBIRC5 by RACGAP1”. In the training data the joint distribution ofexpressions looks like in FIG. 3.

The Pearson correlation coefficient is 0.73.

One approach is to create a mapping function from RACGAP1 to BIRC5 byregression. Linear regression is the first choice and yields in thisexample

BIRC5=1.22*RACGAP1−2.85.

Using this equation one can easily replace the BIRC5 variable in e.g.algorithm T5 by the right hand side. In other examples robustregression, polynomial regression or univariate nonlinearpre-transformations may be adequate.

The regression method assumes measurement noise on BIRC5, but no noiseon RACGAP1. Therefore the mapping is not symmetric with respect toexchangeability of the two variables. A symmetric mapping approach wouldbe based on two univariate z-transformations.

z=(BIRC5−mean(BIRC5))/std(BIRC5) and

z=(RACGAP1−mean(RACGAP1))/std(RACGAP1)

z=(BIRC5−8.09)/1.29=(RACGAP1−8.95)/0.77

BIRC5=1.67*RACGAP1+−6.89

Again, in other examples, other transformations may be adequate:normalization by median and/or mad, nonlinear mappings, or others.

1. A method for predicting an outcome of breast cancer in an estrogenreceptor positive and HER2 negative tumor of a breast cancer patient,said method comprising: (a) determining in a tumor sample from saidpatient the RNA expression levels of at least 2 of the following 9genes: UBE2C, BIRC5, RACGAP1, DHCR7, STC2, AZGP1, RBBP8, IL6ST, and MGP;and (b) mathematically combining expression level values for the genesof the said set which values were determined in the tumor sample toyield a combined score, wherein said combined score is indicative of aprognosis of said patient.
 2. The method of claim 1 comprising:determining in a tumor sample from said patient the RNA expressionlevels of at least 3 of the following 9 genes: UBE2C, BIRC5, RACGAP1,DHCR7, STC2, AZGP1, RBBP8, IL6ST, and MGP.
 3. The method of claim 1comprising: (a) determining in a tumor sample from said patient the RNAexpression levels of the following 8 genes: UBE2C, RACGAP1, DHCR7 ,STC2, AZGP1, RBBP8, IL6ST, and MGP; and (b) mathematically combiningexpression level values for the genes of the said set which values weredetermined in the tumor sample to yield a combined score, wherein saidcombined score is indicative of a prognosis of said patient.
 4. Themethod of claim 1 comprising: (a) determining in a tumor sample fromsaid patient the RNA expression levels of the following 8 genes: UBE2C,BIRC5, DHCR7 , STC2, AZGP1, RBBP8, IL6ST, and MGP; and (b)mathematically combining expression level values for the genes of thesaid set which values were determined in the tumor sample to yield acombined score, wherein said combined score is indicative of a prognosisof said patient.
 5. The method according to claim 4 wherein BIRC5 may bereplaced by UBE2C or TOP2A or RACGAP1 or AURKA or NEK2 or E2F8 or PCNAor CYBRD1 or DCN or ADRA2A or SQLE or CXCL12 or EPHX2 or ASPH or PRSS16or EGFR or CCND1 or TRIM29 or DHCR7 or PIP or TFAP2B or WNT5A or APOD orPTPRT with the proviso that after a replacement 8 different genes areselected; and UBE2C may be replaced by BIRC5 or RACGAP1 or TOP2A orAURKA or NEK2 or E2F8 or PCNA or CYBRD1 or ADRA2A or DCN or SQLE orCCND1 or ASPH or CXCL12 or PIP or PRSS16 or EGFR or DHCR7 or EPHX2 orTRIM29 with the proviso that after a replacement 8 different genes areselected; and DHCR7 may be replaced by AURKA, BIRC5, UBE2C or by anyother gene that may replace BIRC5 or UBE2C with the proviso that after areplacement 8 different genes are selected; and STC2 may be replaced byINPP4B or IL6ST or SEC14L2 or MAPT or CHPT1 or ABAT or SCUBE2 or ESR1 orRBBP8 or PGR or PTPRT or HSPA2 or PTGER3 with the proviso that after areplacement 8 different genes are selected; and AZGP1 may be replaced byPIP or EPHX2 or PLAT or SEC14L2 or SCUBE2 or PGR with the proviso thatafter a replacement 8 different genes are selected; and RBBP8 may bereplaced by CELSR2 or PGR or STC2 or ABAT or IL6ST with the proviso thatafter a replacement 8 different genes are selected; and IL6ST may bereplaced by INPP4B or STC2 or MAPT or SCUBE2 or ABAT or PGR or SEC14L2or ESR1 or GJA1 or MGP or EPHX2 or RBBP8 or PTPRT or PLAT with theproviso that after a replacement 8 different genes are selected; and MGPmay be replaced by APOD or IL6ST or EGFR with the proviso that after areplacement 8 different genes are selected.
 6. The method according toclaims 1, wherein said patient has received endocrine therapy or iscontemplated to receive endocrine treatment.
 7. The method of claim 6,wherein said endocrine therapy comprises tamoxifen or an aromataseinhibitor.
 8. The method according to claim 1 wherein a risk ofdeveloping breast cancer recurrence or cancer related death ispredicted.
 9. The method according to claim 1, wherein said expressionlevel is determined as a Messenger-RNA expression level.
 10. The methodaccording to claim 8, wherein said expression level is determined by atleast one of a PCR based method, a microarray based method, or ahybridization based method.
 11. The method of claim 1, wherein saiddetermination of expression levels is in a formalin-fixed paraffinembedded tumor sample or in a fresh-frozen tumor sample.
 12. The methodof claim 1, wherein the expression level of at least one marker gene isdetermined as a pattern of expression relative to at least one referencegene or to a computed average expression value.
 13. The method of claim1, wherein said step of mathematically combining comprises a step ofapplying an algorithm to values representative of expression levels ofgiven genes.
 14. The method of claim 13, wherein said algorithm is alinear combination of said values representative of expression levels ofgiven genes.
 15. The method of claim 14 wherein a value for arepresentative value of an expression level of a given gene ismultiplied with a coefficient.
 16. The method of claim 1, wherein one ormore thresholds are determined for said combined score, thatdiscriminate into high and low risk, high, intermediate and low risk, ormore risk groups by applying the threshold on the combined score. 17.The method of claim 1, wherein a high combined score is indicative ofbenefit from cytotoxic chemotherapy.
 18. The method of claim 1, whereininformation regarding nodal status of the patient is processed in thestep of mathematically combining expression level values for the genesto yield a combined score.
 19. The method of claim 17, wherein saidinformation regarding nodal status is a numerical value if said nodalstatus is negative and said information is a different numerical valueif said nodal status positive and a different or identical number ifsaid nodal status is unknown.
 20. A kit for performing a method of claim1, said kit comprising a set of oligonucleotides capable of specificallybinding sequences or to sequences of fragments of the genes in acombination of genes, wherein said combination comprises at least thetwo of the 9 genes UBE2C, BIRC5, RACGAP1, DHCR7, STC2, AZGP1, RBBP8,IL6ST, and MGP.
 21. A computer program product capable of processingvalues representative of expression levels of a set of genes,mathematically combining said values to yield a combined score, whereinsaid combined score is indicative of efficacy from endocrine therapy ofsaid patient, according to the method of claim
 1. 22. The method ofclaim 1, comprising: determining in a tumor sample from said patient theRNA expression levels of at least 6 of the following 9 genes: UBE2C,BIRC5, RACGAP1, DHCR7, STC2, AZGP1, RBBP8, IL6ST, and MGP.
 23. Themethod of claim 1, wherein two or more thresholds are determined forsaid combined score, that discriminate into high and low risk, high,intermediate and low risk, or more risk groups by applying the thresholdon the combined score.